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We provide an exact estimate on the maximal subword complexity for quasiperiodic infinite words. 
To this end we give a representation of the set of finite and of infinite words having a certain 
quasiperiod q via a finite language derived from q. It is shown that this language is a suffix code 
having a bounded delay of decipherability. 

Our estimate of the subword complexity now follows from this result, previously known results 
on the subword complexity and elementary results on formal power series. 
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In his tutorial [Mar04 ] Solomon Marcus provided some initial facts on quasiperiodic infinite words. Here 
he posed several questions on the complexity of quasiperiodic infinite words. Some answers mainly for 
questions concerning quasiperiodic infinite words of low complexity were given in [LR04, LR07 ]. 

The investigations of the present paper turn to the question which are the maximally possible com- 
plexity functions for those words. As complexity we follow Marcus' [Mar04] Question 2 to consider the 
(subword) complexity function f{£,,n) of an infinite word f(%,n) being its number of subwords of 
length n. This subword complexity of infinite words (ft)-words) was mainly investigated for those words 
of low (polynomial) complexity (see the tutorial IBK031 or the book IAS03P . In RSta931 ISta97ll some 
results on exponential subword complexity helpful for the present considerations are derived. 

As a final result we obtain that the maximally possible complexity functions for quasiperiodic infinite 
words t, are bounded from above by a function of the form /(£,n) < ct - tp where tp is the smallest 
Pisot-Vijayaraghavan number, that is, the unique real root tp of the cubic polynomial x 3 — x — 1, which is 
approximately equal to tp « 1.324718. We show also that this bound is tight, that is, there are ft)-words 
t, having f{£,n) ^c-t n P . 

The paper is organised as follows. After introducing some notation we derive in Section [2] a charac- 
terisation of quasiperiodic words and ft)-words having a certain quasiperiod q. Moreover, we introduce 
a finite basis set P q from which the sets of quasiperiodic words or ft)-words having quasiperiod q can be 
constructed. In Section [3] it is then proved that the star root of P q is a suffix code having a bounded delay 
of decipherability. 

This much prerequisites allow us, in Section |4] to estimate the number of subwords of the language 
Q q of all quasiperiodic words having quasiperiod q. It turns out that c q \ ■ X q < f(Q q ,n) < c qi 2 • X q where 
f(Q q ,n) is the number of subwords of length n of words in Q q and 1 < X q < tp depends on q. From these 
results we derive our estimates for the subword complexity of quasiperiodic infinite words. Finally, we 
show that, for every quasiperiod q, there is a quasiperiodic ft)-word £ with quasiperiod q whose subword 
complexity meets the upper bound c q> 2 -X q . 
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1 Notation 

In this section we introduce the notation used throughout the paper. By IN = {0, 1,2, . . .} we denote the 
set of natural numbers. Let X be an alphabet of cardinality \X\ = r > 2. By X* we denote the set of finite 
words on X, including the empty word e, and X m is the set of infinite strings (co-words) over X. Subsets 
of X* will be referred to as languages and subsets of X a as co-languages. 

For w G X* and tj G X* UX m let w ■ tj be their concatenation. This concatenation product extends 
in an obvious way to subsets L C X* and B C X* UX m . For a language L let L* := \J ieJN L', and by 
L m := {w\ ■ ■ - Wi- ■ ■ :wj G L\ {e}} we denote the set of infinite strings formed by concatenating words in 
L. Furthermore \w\ is the length of the word w G X* and pref(fi) is the set of all finite prefixes of strings 
in B C X* L>X a . We shall abbreviate w G pref(Tj) (tj g X* UX m ) by w C tj. 

We denote by fi/w := {tj : w • tj G B} the fe/j: derivative of the set 8CI* UX m . As usual, a language 
LCX* is regular provided it is accepted by a finite automaton. An equivalent condition is that its set of 
left derivatives {L/w : w G X*} is finite. 

The sets of infixes of B or tj are infix(B) := \J weX * pref(fi/w) and infix(Tj) := \J weX * pref({Tj}/w), 
respectively. In the sequel we assume the reader to be familiar with basic facts of language theory. 

As usual a language L C X* is called a code provided w\ ■ ■ ■ wi = vi • • ■ v# for w\ , . . . , wi , vi , . . . , v& G L 
implies / = k and w, = v,-. 

2 Quasiperidicity 

2.1 General properties 

A finite or infinite word tj G X* U X m is referred to as quasiperiodic with quasiperiod q G X* \ {e} 
provided for every j < |tj| G 1NU {°°} there is a prefix uj C tj of length j — \q\ < \uj\ < j such that 
Uj-qQrj, that is, for every w C tj the relation u\ w \ \Zw Q u\ w \ ■ q is valid. 

Let for q G X* \ {<?}, Q q be the set of quasiperiodic words with quasiperiod q. Then {q}* ^Q q = Q* q 
mdQ q \{e} CX* -qHq-X*. 

Definition 1 A family (w,-J . =1 , I G INU {°o}, of words w,- G X* • ^ is referred to as a q-chain provided 
wi = w,- C and - < |^|. 

It holds the following. 
Lemma 2 

1- w G Q q \ {e} if and only if there is a q-chain (wj) -_j ^c/i Z/iaZ = w. 

2. An co-word E, G X m is quasiperiodic with quasiperiod q if and only if there is a q-chain (w/) j . =1 
such that wi\zt,. 

Proof: It suffices to show how a family («y)^ 1 can be converted to a g-chain (w/)^ =1 and vice versa. 
Consider tj G X* L)X W and let (w/)^ 1 be a family such that uj ■ q C tj and j — |^| < \uj\ < j for 

j < Inl- 

Define w\ :=^andw I+ i :=u\ w .\ -^aslong as < |tj|. Thenw, C tj and < = |«| w .| < 

\wi\ + \q\. Thus (Wi) . =1 is a g-chain with w, C tj. 

Conversely, let (w ; J . =1 be a ^-chain such that w, C tj and set 

My := maxg {w 1 : 3/(w' • ^ = w, A |w'| < j) } , for j < | tj | . 
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By definition, uj ■ q C T] and |m,-| < j. Assume \uj\ < j — \q\ and Uj ■ q = w,. Then |w,| < j < \rj\. 
Consequently, in the g-chain there is a successor w,+i, < + |g| < j + \q\. Let w !+ i = w" • g. 

Then w y - C w" and \w"\ < j which contradicts the maximality of uj. □ 

Corollary 3 LetuG pref ( Q q ) . Then there are words w,w' G Q q such that w C u C W and \ u \ — \ w | , | w' \ — 
\u\ < \q\. 

Corollary 4 Let E, Gl ffl . Then the following are equivalent. 

1. t, is quasiperiodic with quasiperiod q. 

2. pref (£, ) fl Q q is infinite. 

3. pref(£)Cpref(Q ? ). 

2.2 A finite generator for quasiperiodic words 

In this part we introduce the finite language P q which generates the set of quasiperiodic words as well 
as the set of quasiperiodic ft)-words having quasiperiod q. We investigate basic properties of P q using 



simple facts from combinatorics on words (see e.g. |Shy01 1). We set 



P g :={v:envQqnv-q}. (1) 
Then we have the following properties. 

Propositions Q q = P*- q u{e}QP*, (2) 

pref(P*) = pref (g,) = P,*- pref («?) (3) 

Proof: In order to prove Eq. (0 we show that w, G P* ■ q for every g-chain (w,)^ =r This is certainly 
true for wi = q. Now proceed by induction on i. Let w\ = w\ ■ q G P* • q and = w' i+l ■ q. Then 
w'j ■ Vj = w- +1 . Now from w,- C we obtain e C V; C g C v,- • g, that is, v* G P 9 . 

Eq. (0) is an immediate consequence of Eq. (0. □ 
Corollary 0] and Proposition [5] imply the following characterisation of ft)- words having quasiperiod q. 

{£, : % G X a A § has quasiperiod <?} = P® (4) 

Proof- Since P ? is finite, P ? ffl = : £ G X ffl Apref(^) C pref(P*)}. □ 
The following property of words in P q is a consequence of the Lyndon-Schiitzenberger Theorem (see 
|BP85l |ShyOH ). 

Proposition 6 vGP 9 if and only if \v\ <\q\ and there is a prefix v \Zv such that q = v k -v for k = [|#|/|v|J. 

Proof: Sufficiency is clear. Let now v G P q . Then v Q q n v ■ q. This implies v 1 Q q \Zv ! -q as long as 
/ < k and, finally, q C v k+1 . □ 

Corollary 7 v £ P q if and only if\v\ < \q\ and there is a k! G IN such that qQv k . 

Now set qo := mine Pq- Then in view of Proposition [6] and Corollary |7] we have the following. 

q = q\-q for k = [_M/|<7o|J and some g C go • (5) 



Corollary 8 The word qo is primitive, that is, there are no u G X* and n > 1 swc/j f/iaf go = 
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Proof: Assume g = q\ for some I > 1. Then q = q{- q\ where q\ C q\, and, consequently, q C g^' /+;+1 
contradicting the fact that qo is the shortest word in P q . □ 

Proposition 9 7. 7/v G P q and wQq then v-wQqorqHvw. 
2. Ifv G Pg and \v\ < \q\ — \qo\ then v = q™ for some m G IN. 

Proof: The first assertion follows from v C. q \zv ■ q and v - w Qv ■ q. 

For the proof of the second one observe that, by the first item v • qo C q and g • v C g whence 
go • v = v • go. Thus go and v are powers of a common word. Since go is primitive, the assertion follows. 
□ 

Theorem 10 Ifv G P q and w ■ v C q then w G {go}*- 

Proof: lfv<EP q then go E v. Thus it suffices to prove the assertion for go. 

Let w ■ qo C q = q\ ■ q. Then w ■ qo E ^o +2 anc ^' trivially, go C go +2 - Since |w • go| + |go| < |<?q +2 | , 
w ■ qo and go are powers of a common word. The assertion follows because go is primitive. □ 



3 Codes 

In this section we investigate in more detail the properties of the star root of P q , that is, of the smallest 
subset VCP ? such that V* = P q q . It turns out that %/Pq~ is a suffix code which, additionally, has a 
bounded delay of decipherability. This delay is closely related to the largest power of go being a prefix 
of q. 

According to [BP85 ] a subset C C X* is a code of a delay of decipherability m G IN if and only if 
for all w,w',v\,. . . ,v m G C and u G C* the relation w ■ v\ ■ ■ ■ v m Qw'-u implies w = w' . Observe that 
C C X* \ {e} is a prefix code, that is, w,w', G C and w Qw' imply w = w', if and only if C has delay 0. 
A subset C C X* \ {e} is referred to as a suffix code if no word w G C is a proper suffix of another word 
v G C. 



Define now the star-root of P q : 



'P q :=P q \{P*-P* q ) 
It holds the following. 

%/P~ q = (P^\{go}*)u{g } C{g }u{v:vCgA|g | + |v| > |g|} (6) 

Proof: First we prove the identity. The inclusion "C" follows from [P q \ {go}*) U {go} C P q C ((P g \ 

{g }*)U{g })*. ' ' 

To prove the reverse inclusion assume I > 1 and vi •••v^ € P 9 for v,- G P q . Then |go| < |v,| and thus 

|go| + |vf| < \q\ for all i. According to Proposition 19121 we have v,- G {go}* which shows P q n [P q -P*) C 

{qo}*- ' ' ^ , 

The remaining inclusion now follows from Proposition 19121 □ 
Next we are going to show that %/P~ q is a suffix code having a bounded delay of decipherability. 

Corollary 11 %/P~ q is a suffix code. 

Proof: Assume u = w ■ v for some u,v G %/P~ q ,n^v. Then Theorem [lOlpro ves w G {go}* E P q . Ifw^e, 
in view of u C g Proposition I9l2l implies v G {go}* and hence w G {go}*- Thus u = v = qo contradicting 
«/v. □ 
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Theorem 12 Let q = q^- q where q C qo. Then %/~P~ q is a code having a delay of decipherability of at 
most k+\. 

Proof: We have to show that if the words v • w\ ■ ■ ■ Wk+\ and v' • w\ ■ ■ ■ w' k+l , where v, w\ , . . . , w^+i , 
v',w\ w' k+l € XfPq are comparable w.r.t. "C" then v = v'. 

Without loss of generality, assume v C V . Then |^o| < |v| < |v'| < \q\. We have |w,|,|w-| > \qo\- 
Thus \w\ ■ ■ -Wfc+il, |v/j • • > \q\. Moreover, according to Proposition I9TT1 q □ w\ ■ ■ • w^+\ and q C 

w \ " ' > wnence v-?Cv' , j. Then in view of the inequality | v| + \q\ > \v'\ + \qo\ we have q □ w • 
for the word ic/e with v • w = v' and, according to Theorem[lO]w E {^o}*- This contradicts the fact that 
%/Pq is a suffix code. □ 
We provide examples that, on the one hand, the bound in Theorem [12] cannot be improved and, on the 
other hand that it is not always attained. Since for q = q^, k 6 IN, the code %/P~ q = {qo} is a prefix code, 
we consider only non-trivial cases. 

Example 13 Let q := aabaaaaba. Then qo = aabaa, k = \ and \fP~ q = P q = {qa,aabaaaab,q } which 
is a code having a delay of decipherability 2. 

Indeed aabaaaabaa = qo-qo Q q-qo or 

aabaaaabaa = qo ■ qo Q aabaaaab -qo. ^ 

Moreover q-qo ^ Q q - Thus our Example [TBI shows also that q ■ P* need not be contained in Q q . 

Example 14 Let q := aba. Then k = 1 and P q = {ab,aba} is a code having a delay of decipherability l.Q 



4 Subword Complexity 

In this section we investigate the subword complexity of the language Q q . To this end we derive general 
relations between the numbers of words of a certain length for regular languages, their prefix- and their 
infix-languages. Then using elementary methods of the theory of formal power series (cf. [B P851ISS78ID 
we estimate values characterising the exponential growth of the family (|infix(2 g ) C\X n \) ne j^. 
We start with some prerequisites on the number of subwords of regular star-languages. 

Lemma 15 IfL C X* is a regular language then there is a k £ IN such that 

\LnX n \ < |pref(L)nX"| < Z k i= o\LC\X n+i \ 
|pref(L)nX"| < |infix(L)nX"| < k- |pref(L) DX"| 

As a suitable k one may choose the number of states of an automaton accepting the language L C X*. 

Moreover, Corollary 4 of [Sta85 ] shows that for every regular language LCX* there are constants 
c\ , C2 > and a X > 1 such that 

c x -X n < |pref(L*)nX"| <c 2 -X". (8) 
A consequence of Lemma [131 is that Eq. ([8]> holds also (with constant k ■ c 2 instead of ci) for infix(L*). 



4.1 The subword complexity of Q q 

It is now our task to estimate the value X q which satisfiesci -A^ < |infix(,P*) nX"| <k-c 2 -X^. Following 
Lemma [T5l and Eqs. (J8]) and © it holds 



A 9 = limsup^/|/ 5 *nX"| (9) 

n— >°° ' 
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which is the inverse of the convergence radius rads* of the power series s*(t) := Y,new \P% nX"| • t" (the 
structure generating function of the language P*). 

If |<7o| divides \q\ then P* = {qo}* whence X q = 1. Therefore, in the following considerations we 
may assume that |g|/|<7o| ^ IN. 

Since %/Pq is a code, we have s*(?) = ^ where s ? (f) := £ y£ ^/F f ' v ' * s tne structure generating 
function of the finite language \[P~ q . Thus the convergence radius rads* is the smallest root of 1 — s q (t\. 
It is readily seen that this root is positive. So X q is the largest positive root of the reversed polynomial 
p q (t) := — L ve v '- Summarising these observations we obtain the following. 

Lemma 16 Let q G X* \ {e}. Then there are constants c q ^,c qi 2 > such that the structure function of 
the language infix (Q q ) satisfies 

c qA ■ X q < |infix(e,) nX n \ < c qa ■ X q 
where X q is the largest (positive) root of the polynomial p q {t). 

Remark. One could prove Lemma [T6l by showing that, for each polynomial p q (t), its largest (positive) 
root has multiplicity 1 . Referring to Corollary 4 of MSta85 1 (see Eq. ©) we avoided these more detailed 
considerations of a particular class of polynomials. 

In order to facilitate the search for the maximum of the values X q we may restrict our considerations 
to the case when |^o| > M/2- 

Lemma 17 If\qo\ does not divide \q\ and the language P* is maximal w.r.t. "C" in the class {P*, : q' G 
X*\{e}} then \q \ > \q\/2. 

Proof: If |g|/|go| 4- IN and |go| < M/2 we have q = q\ ■ q for k > 2 and e ^ q n qo- Then, obviously 
P*CP*, iovq':=q Q -q. □ 

From |<7o| > M/2 we obtain that p q (t) has the form M -Y.ieM 1 ' where G M C {j : j < if}. In MPol091 
the following properties were derived. 

Lemma 18 Let := {t n -ZieMt' : n > 1 AO G M C {j : j < %i}}. 77ien 

1. /or every « > 1 f/ie polynomial t" — L (= q ? ! largest positive root among all polynomials of 
degree n in and 

2. the polynomials t 3 — t — 1 and t 5 — t 2 — t — 1 = (t 2 + 1) • (? 3 — t — 1) /iave ?/ze largest positive roots 
among all polynomials in SP. 

Two remarks are in order here. 

1 . It holds p a n ba n (t) = t 2n+x - Yfl =0 t l and p a „ b 2 a „ (t) = t 2n+1 - £;. ! =0 so for all degrees > 1 there are 
polynomials of the form p q (t) in 

2. The positive root tp of p aba (t) = t 3 — t — l (or of p a 2 ba 2 (t)) is known as the smallest Pisot-Vijayaraghavan 
number, that is, a positive root > 1 of a polynomial with integer coefficients all of whose conjugates 
have modulus smaller than 1. 

Before proceeding to the proof of Lemma[T8]we recall that the polynomials p{t) G & have the following 
easily verified property. 

If £ >0 and pit') > for some t' > then p((\ + e)-t') > 0. (10) 
'if \qo\ divides \q\ we have pq(t) = t' qo ' — 1 instead. 
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Since p(0) = -1 < for p(t) G Eq. (TTOj) shows that once p(t') > 0, t' > the polynomial has 
no further root in the interval (t',°°). 

Proof: Using Eq. CfTOb the first assertion is easy to verify. 

To show the second one it suffices to show that p n (tp) > for every polynomial of the form p n (t) := 

t" - t l other than t 3 - t - \ or t 5 - t 2 -t -\. 

For degrees n = l,2orn = 4 this is readily seen. 

Now we proceed by induction on n. To this end we observe the following properties of the family 

(Pn(t))n>l- 

Pn+i(t) - Pnit) = t n+2 -t"-t^forn>3 (11) 

From this one easily obtains that p n+ 2 (tp) — Pn (tp) =tp tp > for n > 4, and the assertion follows 
by induction. □ 



4.2 The subword complexity of co-words 

Having derived the results on the the subword complexity of quasiperiodic words we are now in a position 
to contribute to an answer to Question 2 in [Mar04] by deriving tight upper bounds on the subword 
complexity of quasiperiodic infinite words. 

To this end we recall that infix(<^) C inRx(Q q ) for every ft)-word <^ with quasiperiod q. Thus we 
obtain the following upper bound. 

Lemma 19 If E, G X m is quasiperiodic with quasiperiod q then f(%,n) = |infix(^) DX n \ < c ■ X" for a 
suitable constant c > not depending on £ . 

Following the proof of Proposition 5.5 in [Sta93] it can be shown that this upper bound is tight. 

Lemma 20 For every quasiperiod q£X*\ {e} there is at, e such that c q ^\ ■ X q < f{t, , n) = |infix(^ ) n 
X n \. 

Here c 9) i is the constant mentioned in Lemma PT6l Proof: Let P* = {vq,vi,V2 . . .} and define E, := 
riieiN v c Then obviously infix(<^) = infix (P*) = infix(<2 ? ). □ 
An over-all upper bound on the subword complexity of quasiperiodic cu-words now follows from 
Lemma [T8l 

Theorem 21 There is a constant c > such that for every quasiperiodic co-word t, G X m there is an 
«| GIN such that f(t;,n) = |infix(^)nX"| <c-t P foralln>n^. 

We conclude this section with the following remark. 

Remark. Theorem [21] is independent of the size of the alphabet X. And indeed, quasiperiodic ft)- words 
of maximal subword complexity have quasiperiods of the form aba or aabaa, a,b G X, a^b (see the 
remark after Lemma IT~8T). thus consist of only two different letters. 



5 Concluding Remark 

In the present paper we investigated the maximally achievable subword complexity for quasiperiodic 
infinite words. It should be mentioned that using results of [Sta93 ] the bounds obtained here can be 
extended to the Kolmogorov complexity of infinite words. 
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In HSta931 Section 5] the asymptotic subword complexity of an G)-word § G X a was introduced as 

rf\ i- logiv-i |inflx(^)nX" . iii ■ , 

T(g J := lim n ^oo — u — and it was shown that % is an upper bound to the asymptotic upper and 

lower Kolmogorov complexities of infinite words: 

K(§)<K(§)<T(0. 

Moreover, from the results of IISta93[ Section 4] it follows that for every quasiperiodic word q there is 
a B, G such that = t(£) = logi^i X q , that is, a quasiperiodic co-word having quasiperiod q of 
maximally possible asymptotic (lower) Kolmogorov complexity 
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